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We identify three problems with current techniques for implementing protocols among threads, 
which complicate and impair the scalability of multicore software development: implementing syn- 
chronization, implementing coordination, and modularizing protocols. To mend these deficiencies, 
we argue for the use of domain-specific languages (DSL) based on existing models of concurrency. To 
demonstrate the feasibility of this proposal, we explain how to use the model of concurrency Reo as 
a high-level protocol DSL, which offers appropriate abstractions and a natural separation of protocols 
and computations. We describe a Reo-to-Java compiler and illustrate its use through examples. 



1 Introduction 

With the advent of multicore processors, a new era began for many software developers of general, non- 
numerical, applications: to harness the power of multicore processors, the need for writing concurrently 
executable code, instead of traditional sequential programs, intensified — a notoriously difficult task with 
the currently popular tools and technology ! To alleviate the burden of implementing concurrent appli- 
cations, researchers started developing new techniques for multicore programming. Examples include 
stream processing, transactional memory, and lock-free synchronization. However, one rather high-level 
aspect of multicore programming has received only little attention: the sets of rules that interacting par- 
ties must abide by when they communicate with each other — protocols. In this paper, we investigate a 
new approach for implementing protocols among threads. 

Many popular general-purpose programming languages (GPPL) feature threads: concurrently execut- 
ing program fragments sharing the same address space. To name a few such GPPLs and (some of) their 
multithreading facilities: 

• Fortran has coarrays and OpenMP; 

• C and C++ have Pthreads and OpenMP; 

• Objective-C has Pthreads and the NSThread class; 

• Visual Basic and C# have the System . Threading namespace; 

• Java has the Thread class. 

These languages have a combined share of roughly 63% according to the Tlp and Tiobe indexes of Jan- 



uary 2012 From these statistics, one can conclude that (a good portion of) sixty percent of software 
developers encounters threads regularly. Consequently, many developers will benefit from improvements 
to existing techniques for implementing protocols among threads. We consider such improvements not 
merely relevant but a sheer necessity: the current models and languages, APIs and libraries, fail to scale 
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when it comes to implementing protocols. Importantly, we refer to scalability not only in terms of per- 
formance but also in terms of other aspects of software development (e.g., correctness, maintainability, 
and reusability). Our approach in this paper takes such aspects into account. 

Organization In Section [2j we identify three problems with current techniques for implementing pro- 
tocols among threads. In Section[3] we sketch an abstract solution based on the general notion of domain- 
specific languages. In Section [4| we concretize our approach for one particular such language, namely 
Reo. Section [5] concludes this paper. 

2 Problems with Implementing Protocols 

While threads prevail for implementing concurrency in general-purpose programming languages (GPPL), 
they also provoke controversy. Programming with threads would inflict unreasonable demands on the 
reasoning capabilities of software developers, partly due to the unpredictable ways in which threads 
interact with each other [7]: typically, one cannot analyze all the ways in which threads may interleave, 
and consequently, unforeseen — and potentially dangerous — execution paths may exist. Some propose 
to discard our present notion of threads, unless we improve our ways of programming with them EH . 
Because many existing GPPLs support threads — and since this seems unlikely to change in the near 
future — we gear our efforts toward getting such improvements. In particular, our interest lies in solving 
problems with implementing protocols among threads. In this section, we identify three such problems. 

At first sight, writing the computation code and the protocol code of a program using a single lan- 
guage may seem only natural. Indeed, many popular GPPLs have sufficient expressive power for doing 
so. Nevertheless, we consider it an inappropriate approach in many cases: typically, language designers 
gear GPPLs toward implementing computations. Implementing protocols, at a suitable level of abstrac- 
tion, seems a secondary concern. Consequently, these languages work well for writing computation 
code, but not so for developing protocol code: the low-level concurrency constructs that they provide 
do not coincide with the higher-level concepts needed to express protocols directly. This results in two 
problems that complicate "writing code" for protocols. 

Problem 1 (Implementing synchronization) Threads communicating with each other using a shared 
memory, by directly reading from and writing to their common address space, must synchronize their 
actions. However, implementing synchronization using primitives such as locks, mutexes, sempahores, 
etc., comprises a tedious and error-prone activity. 

Problem 2 (Implementing coordination) Threads interacting with each other in structured ways, ac- 
cording to some protocol, require coordination to ensure that they respect this protocol. However, imple- 
menting coordination using constructs such as assignments, if-statements, while-loops, etc., produces 
code that only indirectly conveys a protocol, which make it a tedious and error-prone activity. 

Interestingly, these two problems have a common cause: the lack of appropriate abstractions for im- 
plementing communication and interaction in GPPLs. For example: software developers should specify 
that a thread sends two integers and receives an array of rationals for a response — not that a thread allo- 
cates shared memory and performs pointer arithmetic. Or: developers should specify that communication 
between two threads inhibits interaction among other threads — not that threads acquire and release locks. 
Or: developers should specify that threads exchange data elements synchronously (i.e., atomically) — not 
that threads wait on a monitor until they get notified. We believe that programming languages should 
enable developers implementing communication and interaction to focus on the logic of the protocols 
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import java.util . LinkedList ; 

import java.util . concurrent . Semaphore ; 

public class Main { 

private LinkedList<Object> buffer; 
private Semaphore notEmpty; 
private Semaphore notFull ; 

public MainO { 

buffer = new LinkedList<Object>() ; 
notEmpty = new Semaphore (0) ; 
notFull = new Semaphored); 
(new Producer ()). start () ; 
(new Producer () ). start () ; 
(new Consumer () ). start () ; 

} 



17 private class Producer extends Thread { 

18 public runO { 

19 while (true) { 

20 Object d = produce (); 

21 notFull . acquire () ; 

22 buf fer. offer (d) ; 

23 notEmpty . release () ; 

24 } } } 

25 private class Consumer extends Thread { 

26 public runO { 

27 while (true) { 

28 notEmpty . acquire () ; 

29 Object d = buff er .poll() ; 

30 notFull . releaseO ; 

31 consume (d) ; 
« } } } } 



Figure 1: Java implementation of the producer-consumer example in flU Algorithm 6.8]. 



involved — not on the realization of the necessary synchronization and coordination. We (should) have 
compilers for that. 

In addition to the two problems identified above, the lack of appropriate abstractions in GPPLs causes 
a third problem: in the absence of proper structures to enforce (or at least encourage) good protocol 
programming practices, programmers frequently succumb to the temptation of not isolating protocol 
code. Conceptually, this problem differs from Problems [T] and [2j because it does not complicate "writing 
code" directly. However, it does perplex essentially everything else involved in a software development 
process. Although notions such as "modularization" [22] and "separation of concerns" [ 10] have long 
histories in computer science, linguistic support for their application in programming of concunency 
protocols has scarcely received due attention. Modularization and separation of concerns have driven 
the development of modern programming languages and software development practices for decades. In 
fact, already in the early 1970s, Parnas attributed three advantages to abiding by these principles [22J: 

"(1) managerial — development time should be shortened because separate groups would 
work on each module with little need for communication; (2) product flexibility — it should 
be possible to make drastic changes to one module without a need to change others; (3) 
comprehensibility — it should be possible to study the system one module at a time." 

Nevertheless, popular GPPLs do not enforce modularization of protocols. Consequently, dispersing pro- 
tocol code among computation code comprises a common practice for implementing protocols among 
threads. To illustrate such dispersion — and its deficiencies — we discuss the producer-consumer Java 
code in Figure [T] (based on |@1 Algorithm 6.8]). Two producer threads produce data elements and append 
them to a queue buffer (of size 1). Concurrently, a consumer thread takes elements from this queue 
and consumes them. While the queue buffer contains an element, the producers cannot append data 
until the consumer takes this element out of the queue. (We skip the methods produce and consume.) 
Easily, one can get the gist of the protocol involved in this example: the producers send — asynchro- 
nously, reliably, and in arbitrary order — data elements to the consumer. In contrast, one cannot easily 
point to coherent segments of the source code that actually implement this protocol. Indeed, only the 



combination of linespHTl 10- 12 21-23 and 28-30 does so. In this example, thus, we have not isolated 



the protocol in a distinct module; we have not separated our concerns. Therefore, the "Advantages of 
Modularization" identified by Parnas do not apply. In fact, the monolithic program in Figure [T] suffers 
from their opposites — the "Disadvantages of Dispersion:" 

(1) Groups cannot work independently on computation code and protocol code of monolithic pro- 
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public interface Protocol { 
public void offer(Object o) ; 
public Object pollO; 

} 

public class Main { 

private Protocol protocol; 

public MainQ { 

protocol = new P(); 
(new Producer ()). start () 
(new Producer () ). start () 
(new Consumer () ). start () 



} 



13 private class Producer extends Thread { 

14 public runO { 

is while (true) { 

16 Object d = produceO; 

17 protocol . of fer(d) ; 
>8 } } } 

19 private class Consumer extends Thread { 

20 public runO { 

21 while (true) { 

22 Object d = protocol .poll () ; 

23 consume (d) ; 

24 } } } } 



Figure 2: Reimplementation of the producer-consumer program in Figure [l] 



grams. Moreover, one cannot straightforwardly reuse computation code or protocol code of mono- 
lithic programs in other programs: this would require dissecting and disentangling the former. 

(2) Small changes to a protocol require nontrivial changes throughout a monolithic program. For 
instance, suppose that we allow the producers in the producer-consumer example to send data 
elements only in alternating order. Implementing such turn-taking requires significant changes. 

(3) One cannot study computation code and protocol code in isolation when they are entangled: to 
reason about protocol correctness, one must analyze monolithic programs in their entirety. 

The impact of these shortcomings only increases when programs grow larger, interaction among threads 
intensifies, and protocol complexity increases — a likely situation in the current multicore era. 

Problem 3 (Modularizing protocol code) The lack of appropriate abstractions in GPPLs tempts devel- 
opers to disperse protocol code among code of computational tasks. In that case, developers do not 
isolate protocols in modules (e.g., classes, packages, namespaces), but intermix them with computations. 
This practice makes independently developing, maintaining, reusing, modifying, testing, and verifying 
protocol code problematic or impossible. 

To avoid the Disadvantages of Dispersion, we propose to isolate protocol code in separate modules. 
In object-oriented languages, for instance, one can achieve this by encapsulating all the protocol logic in a 
separate class. To illustrate this approach, we rewrote the monolithic program in Figure [T] as the modular 
program in Figure [2} we moved all the protocol code to a class P (see Section 4.2 for its implementa- 
tion), which implements the interface Protocol^] To such programs, the Advantages of Modularization 
apply. First, groups can develop protocol code (e.g., the implementations of the methods offer and 
poll) independently from computation code. Moreover, one can easily reuse protocol implementations. 
Second, changing the protocol requires changing only the class implementing the protocol (e.g., the class 
P); computation code, however, remains unaffected. Third, we can analyze the protocol in isolation by 
studying only the class implementing the protocol (e.g., the class P). 



3 Solution: Protocol DSLs 



In the previous section, we explained how the lack of appropriate abstractions complicates three as- 
pects of implementing protocols: implementing synchronization (Problem [I]), implementing coordina- 

3 The definition of the interface Protocol in Figure [2] serves only our present discussion: not every protocol has methods 
offer and poll. In general, the interface of a protocol should provide methods that computation code can invoke for executing 
this protocol. In the context of our present discussion, offer and poll seemed appropriate names. 



38 



Modularizing and Specifying Protocols among Threads 



tion (Problem [2]), and modularizing protocol code (Problem [3}. We believe that domain-specific lan- 
guages (DSL) offer a solution to these problems. 

Definition 1 (Domain-specific language [81) A domain-specific language is a programming language 
that offers, through appropriate notations and abstractions, expressive power focused on, and usually 
restricted to, a particular problem domain. 

Domain-specific languages dedicated to the implementation of protocols solve Problems [T] and [2] by 
this very definition. Moreover, such protocol DSLs naturally force developers to isolate their protocols 
in modules: specifying protocols in a different language leads to a clear syntactic separation between 
computation code and protocol code. Consequently, using protocol DSLs in the following workflow 
secures the Advantages of Modularization. 

• Developers write the computation code of an application in a GPPL. 

• Developers specify protocols among threads in a protocol DSL. 

• A DSL compiler compiles protocol specifications to GPPL code, seemlessly integrating protocols 
with computations. 

While the benefits seem clear, one question remains: where to get these protocol DSLs from? Must we 
invent them from scratch? And if so, what kinds of "appropriate abstractions" should they provide? 

Fortunately, we do not need to design everything from the ground up: interaction and concurrency 
have received plenty of attention from the theoretical computer science community over the past decades, 
and researchers have investigated high-level models of concurrency for many years (e.g., Petri nets, 
process algebras). This led to various formalisms for synchronizing and coordinating parties running 
concurrently (e.g., actors, agents, components, services, processes, etc.). We believe that these models 
of concurrency provide appropriate abstractions for specifying and reasoning about protocols (albeit, not 
all do so equally well). In other words, the protocol DSLs that we look for already exist. However, many 
of these formalisms lack sophisticated tool support, and in particular, the kind of compiler mentioned 
above. Therefore, we consider implementing such code generation tools among the main goals in our 
efforts toward alleviating the burden of programming with threads. But which existing concurrency 
formalism should we focus on? 

4 Reo as a Protocol DSL 

One model of concurrency has our particular interest: Reo HI 13, an interaction-based model of concur- 
rency with a graphical syntax, originally introduced for coordinating components in component-based 
systems. As with other models of concurrency, Reo has a solid foundation: there exist various composi- 
tional semantics ifTTl for describing the behavior of Reo programs, called connectors, along with tools for 
analyzing them. This includes both functional analysis (detecting deadlock, model-checking) and rea- 
soning about nonfunctional properties (computing quality-of-service guarantees). Its declarative nature, 
however, distinguishes Reo from other models of concurrency. Using Reo, software developers specify 
what, when, where, and why interaction takes place; not how. Indeed, Reo does not feature primitive 
actions for sending or receiving data elements. Rather, Reo considers interaction protocols as constraints 
on such actions. In stark contrast to traditional models of concurrency, Reo's constraint-based notion of 
interaction has the advantage that to formulate (specify, verify, etc.) protocols, one does not need to even 
consider any of the alternative sequences of actions that give rise to them. 

Using Reo, computational threads remain completely oblivious to protocols that compose them into, 
and coordinate their interactions within, a concurrent application: their code contains no concurrency 
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Figure 3: Syntax and semantics, as a port 
automaton, of MergerWith Buffer. 
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Figure 4: Syntax and semantics, as a port 
automaton, of AlternatorWithBuffer. 



primitive (e.g., semaphore operations, signals, mutex, or even direct communication as in send/receive). 
The sole means of communication for a computational thread consists of I/O actions that it performs 
on its own input/output ports. To construct an application, one composes a set of such threads together 
with a protocol by identifying the input/output ports of the computational threads with the appropriate 
output/input nodes of a Reo connector that implements the protocol. A Reo compiler then generates the 
proper multithreaded application code. 

We proceed as follows. In Section |4~Tj we explain the main concepts of Reo through three example 
connectors, each of which represents a protocol that one can use in the producer-consumer example of 
Section [2] In Section 4.2 we discuss our Reo-to-Java compiler. 



4.1 Reo by Example: Producer-Consumer Protocols 



Figures [3a] |4aj and [5a] show three example connectors (i.e., Reo programs): graphs of nodes and arcs, 
which we refer to as channels. We refer to nodes that admit I/O operations as boundary nodes and draw 
them as open circles in figures. Intuitively, one can interpret the graph representing a Reo connector as 
follows: data elements, dispatched on input (boundary) nodes by output operations, move along arcs to 
other nodes, which replicate them if they have multiple outgoing channels, along to output (boundary) 
nodes, from which input operations can fetch them. Groups of such (input, output, and transport) activ- 
ities may take place atomically. Importantly, communicating parties that perform I/O operations on the 
boundary nodes of a connector remain oblivious to how the connector routes data: parties that fetch or 
dispatch data elements do not know where these elements come from or go to. 



The connector in Figure 3a specifies the same protocol as the one embedded in the Java code in 
Figure [T] We can explain the behavior of this connector, named MergerWith Buffer, best by discussing 
the port automaton [20] that describes its semantics. Figure [3b] shows this automaton (derived automat- 



ically from Figure 3a): every state corresponds to an internal configuration of MergerWithBuffer, while 
every transition describes a step of the protocol specified by MergerWithBuffer. Transitions carry a syn- 
chronization constraint: a set containing those nodes through which a data element passes in an atomic 
protocol step. Thus, in the initial state of MergerWithBuffer, a data element passes either nodes A and 
C or nodes B and C. Every element that passes C subsequently arrives at a buffer with capacity 1. We 



represent this buffer with a rectangle in Figure 3a While the buffer remains full, no data elements can 
pass A, B, or C. In that case, the only admissible step results in the element stored in the buffer leave the 
buffer and pass through D. 

Figure [4] shows another connector, named AlternatorWithBuffer, that one can use in the producer- 
consumer example. In contrast to MergerWithBuffer, AlternatorWithBuffer forces the producers to 
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(a) Syntax (with (p = [A] = "foo"). (b) Semantics. 

Figure 5: Syntax and semantics, as a constraint automaton, of SequencerWithBuffer, 



synchronize (represented by the arrow-tailed edge between nodes A and B): only if they can dispatch data 
elements simultaneously, the connector allows them to do so. In that case, the data element dispatched on 
A passes node C and enters the horizontal buffer; concurrently, the data element dispatched on B enters 
the diagonal buffer. In the next protocol step, the data element in the horizontal buffer leaves this buffer 
and passes node D. Subsequently, the data element in the diagonal buffer leaves this buffer, passes C, and 
enters the horizontal buffer. Finally, the data element now in the horizontal buffer (originally dispatched 
on B) leaves this buffer and passes D. Thus, AlternatorWith Buffer first synchronizes the producers, and 
second, it offers their data elements in alternating order to the consumer. 

Figure [5] shows a third connector, named SequencerWithBuffer^,, that one can use in the producer- 
consumer example. The protocol specified by this connector differs in two significant ways from Merg- 
erWith Buffer and AlternatorWithBuffer. The first difference relates to (the lack of) synchronization: 
unlike AlternatorWithBuffer, SequencerWithBuffer^ does not force the producers to synchronize before 
they dispatch their data elements. (Similar to AlternatorWithBuffer, however, SequencerWithBuffer^, 
orders the sequence in which data elements arrive at the consumer.) The second difference relates to the 



data-sensitivity that SequencerWithBuffer^ exhibits: the zigzagged edge in Figure 5a represents a filter 
channel and we call cp a. filter constraint: only those data elements satisfying its filter constraint propa- 
gate through a filter channel. In this example, we assume a simple filter constraint, namely [A] = "foo", 
which means: the data element passing A equals the string "foo"j^] In other words, if a producer dis- 
patches "foo" on A, the (right-horizontal) buffer becomes filled with "foo"; otherwise, the filter loses the 
dispatched data element, which means essentially, its producer has wasted its turn. In general, filter chan- 
nels facilitate the specification of protocols whose execution depends on the content of the exchanged 
data. 

Port automata cannot express the semantics of connectors with filter channels. For that, we need a 
stronger formalism: constraint automata (CA) Q, which support richer transition structures than port 
automata. Instead of only a synchronization constraint, transitions of CA carry also a data constraint: an 



expression about what the data passing particular nodes should look like in some protocol step. Figure 5b 
shows the constraint automaton that describes the semantics of SequencerWithBuffer^ (we omitted its 
nonboundary nodes from this CA). The symbols m and m' refer to the content of the (right-horizontal) 
buffer while and after a transition fires: m' = [A] means that this buffer contains the value exchanged 
through A after a transition; [D] = m means that the content of this buffer passes through D during a 
transition. 



Alternatively, one can formulate filter constraints as regular expressions or patterns. See Q]. 
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4.2 Compiling Reo to Java 

Next, we discuss how to use Reo as a DSL for implementing protocols: we present here an early version 
of our Reo-to-Java compiler, because it is simpler to explain. Although we focus on Java here, we 
emphasize the generality of our approach: nothing in Reo prevents us from compiling Reo to Fortran, C, 
C++, Objective-C, C#, or Visual Basic. 

Before we can compile anything, we need a means to implement the "paper-and-pencil drawings" of 
connectors. We use the Reo IDE for this purpose, called the Extensible Coordination Tools (ECT)j^] The 
ECT consists of a collection of Eclipse plug-ins, including a drag-and-drop editor for drawing connector 
diagrams. Under the hood, the ECT stores and manipulates such diagrams as XML documents. These 
XML documents serve as input to our Reo-to-Java compiler, detailed next. 

Previously, we introduced Reo in terms of how data elements move through a connector, not unlike 
dataflow programming. Compiling connectors to some kind of distributed application, therefore, may 
seem an obvious choice: nodes naturally map to processing elements (e.g., cores), and the connections 
between processing elements can serve as channels. However, this approach has several shortcomings. 
First, the network topology of the hardware may not correspond with the topology of the connector 
that we want to deploy. Second, emulating Reo channels with hardware connections requires additional 
computations from the processing elements connected. This destroys the original idea of mapping Reo 
channels to hardware connections. Third, achieving the global atomicity, synchronization, and exclusion 
emerging in a connector requires complex distributed algorithms. Such algorithms inflict communication 
and processing overhead, deteriorating performance. In short, construing connectors and their topology 
too literally seems a bad idea in the context of compilation. Instead, our Reo-to-Java compiler compiles 
connectors based on their constraint automaton (CA) semantics. 

The ECT ships with the CA of many common channels, including those in Figures [3j |4j and [5] By 
combining such primitive CA, through the act of composition |3 ], the ECT can automatically compute the 
CA of larger connectors. We use this open source C A library in our Reo-to-Java compiler: on input of an 
XML document specifying a connector, our compiler first computes the corresponding CA. Subsequently, 
it annotates this CA with Java identifiers. Finally, it produces a Java class using Antlr's StringTemplate 
technology |[23l . One can use the resulting class as any Java class. By using CA for compiling connectors, 
we conveniently abstract away their nonboundary nodes. 

To illustrate the compilation process, suppose that we want to compile MergerWith Buffer for use 
in the producer-consumer example of Section [2] After drawing MergerWith Buffer using the ECT, we 
feed the corresponding XML document to our Reo-to-Java compiler. This tool automatically generates 
a Java class MergerWithBuf f er based on the CA semantics of MergerWithBuffer. More precisely, 
MergerWithBuf f er objects run state machines representing the protocol specified by MergerWith- 
Buffer. Figure [6] shows (parts of) the Java class generated by compiling MergerWithBuffer. We discuss 
some of its salient aspects. 

• The class MergerWithBuffer extends the class Thread (line [TJ. By running connectors in 
their own thread, we enable them to proactively sense their environment for I/O operations; with 
massive-scale concurrent hardware, ample cores to run connectors on should always exist. 

• Instances of MergerWithBuffer listen to three ports (line [7]), which grant "computation threads" 
access to the boundary nodes of MergerWithBuffer. All interaction between computation threads 
and a MergerWithBuffer object occurs through the latter's ports: computation threads can per- 
form I/O operations — writes and takes — on ports, which in turn suspend threads until their op- 
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public class MergerWithBuf f er extends Thread { 

/* The current state. */ 

private State current = State. EMPTY; 
private enum State { FULL, EMPTY } 

/* The boundary nodes of this connector. */ 

private Port A; private Port B; private Port D; 

/* The data constraints this connector checks and */ 
I* the memory cells this connector has access to. */ 
II — snip — 

/* A random number generator for selecting transitions. */ 
Random random = new Random ( ) ; 



/* Constructs a MergerWithBuffer. *l 
public MergerWithBuff er(Port A, Port B, 
this. A = A; this.B = B; this.D = D; 



Port D) 



/* Initialize data constraints. *l 
II —snip — 



} 



/* Runs the state machine modeling MergerWithBuffer. */ 
public void run() { 
while (true) { 

switch (current) { 
case State .EMPTY: 

switch (random .nextlnt (2) ) { 

case 0: tFromEmptyToFullAO ; break; 

case 1: tFromEmptyToFullBO ; break; 

} 

break ; 
case State. FULL: 

// — snip— 
break ; 

} } } 
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/* Makes a transition from state EMPTY to state FULL, firing A. *l 
private void tFromEmptyToFullAO { 

I* Lock and get pending writes. *l 

Set<Write> writesOnA = A. lockAndGetWritesO ; 
if (writesOnA. isEmpty () ) { abortQ; return; } 

/* Loci and get pending takes. *l 

Set<Take> takesOnD = A.lockAndGetTakesQ ; 

if (takesOnD. IsEmptyO) { abortO; return; } 

/* Check the synchronization and data constraints. */ 
if (/* — snip— *D { 

I* Process writes and takes. */ 

A .per i ormAndUnlock(/* —snip — */) ; 

D .perf ormAndUnlock(/* —snip — */) ; 

/* Update memory cells. *l 
II —snip — ; 

/* Update state. *l 
current = State . FULL ; 



} 

abort () ; 



} 



/* Makes a transition from state EMPTY to state FULL, firing B. */ 
private void tFromEmptyToFullBO { 
// —snip — 

} 

/* Aborts a transition by unlocking all that it may have locked. *l 
private abort () { 

A.unlockWritesO ; B.unlockWritesO ; 

D.unlockTakesQ ; 



} } 



Figure 6: (Parts of the) Java class generated by compiling MergerWithBuffer (see also Figure[3j). 



erations succeed. More technically, ports extend concurrent data structures called synchroniza- 
tion points^ pairs of sets — one containing pending writes, another containing pending takes — 
supporting and admissible to two-phase locking schemes (see below) J5J. 

The class SyncPoint exposes the following methods: 



- lockAndGetWrites () locks and returns the set of pending write operations (line 42 ). 

- lockAndGetTakes ( ) locks and returns the set of pending take operations (line 46 ). 

- unlockWrites () and unlockTakesQ unlock the sets of writes and takes (lines [72| - |73] >. 

- perf ormAndUnlock(Write) performs the specified write operation (first parameter) and 
unlocks the set of pending write operations (line 53 1. 

- perf ormAndUnlock (Take , Obj ect) performs the specified take operation (first parameter) 
by passing it the data element to take (second parameter) and unlocks the set of pending take 
operations (line 54 1. 

• The overridden method run ( ) implements a state machine corresponding to the input C A of the 
compilation process (lines 25 -37). The main loop never terminates (line 26 1. In each iteration, 



Synchronization points resemble ^-calculus channels. 
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public class P implements Protocol { 
private Port A = new PortO; 
private Port B = new PortO; 
private Port D = new PortO; 
private Map<Thread, Port> threads = 

new ConcurrentHashMap<Thread, Port>(); 



public P() { 

new MergerWithBuf f er (A, 



B, D). start O; 



11 public Object pollO { return D.takeO; } 

12 public void offer(Object o) { 

13 Thread thread = Thread. current Thread O ; 

14 if (! threads . containsKey (thread) ) 

15 synchronized (this) { 

16 threads . put (thread, 

17 ! threads. containsValue (A) ? A : B) ; 

,8 } 

19 threads . get (thread) .write (o) ; 

20 } } 



Figure 7: Class P. 



it randomly selects a transition (line 29 1 going out of the current state (line [3]). We collect code 



responsible for making transitions in separate methods (lines |39f|63] and |66f 
An important step in the process of making a transition consists of checking its synchronization 



and data constraints (line 50 1. To do this in a thread-safe manner, a MergerWithBuf fer object 
employs a two-phase locking scheme. During the growing phase, it acquires the locks of: 



- the set of pending writes of each port providing access to an input node (line 42 ); 



- the set of pending takes of each port providing access to an output node (line 46 1. 

A MergerWithBuf fer object locks only the sets of those boundary nodes that actually occur in 
the constraints under investigation. Later, during the shrinking phase, it releases these locks again 



(lines 43 47 and 62 1. Only between phases, a MergerWithBuf fer object checks the constraints 
under investigation. If they hold, it fires the corresponding transition, transporting data elements 



accordingly and removing the operations involved from the sets it has locked (lines 53 -54) 



To use the class MergerWithBuf fer in the producer-consumer example of Section [2j we should incor- 
porate it in the implementation of the class P, encountered before on line [8] in Figure [2} Figure [7] shows 
this implementation. Line [19] specifies that a producer performs a (blocking) write operation on the port 
assigned to it; line [TT] specifies that a consumer performs a (blocking) take operation. The rest of P con- 
sists of initialization code. The latter characterizes the use of Reo as a protocol DSL: implementations of 
the Protocol interface serve as wrappers for compiled connectors, encapsulating all the protocol logic. 

To change Figure [T] such that it respects the protocol specified by AlternatorWith Buffer requires 
nontrivial modifications across the source code. In contrast, we can straightforwardly implement a class 
Q implements Protocol and replace P() with Q() on line[8]in Figure|2] In fact, Q would differ from 
P only on line [9] in Figure [7] in Q, we would construct an AlternatorWithBuf f er object instead of 
a MergerWithBuf fer object. Similarly, we can use the protocol specified by SequencerWithBufferp. 
This shows that using Reo, we can easily change protocols without affecting computation codej^] 

This subsection demonstrates the feasibility of modularizing protocols and implementing protocol 
DSLs. We remark that this approach does not preclude the use of dedicated implementations for certain 
parts of a protocol. For instance, consider the buffer of MergerWithBuffer. Our Reo-to-Java compiler 
implements this buffer using shared memory and explicit locks (transparent to software developers using 
Reo, though). But suppose that the architecture we deploy our producer-consumer program on fea- 
tures also hardware transactional memory (HTM) lfl2l . Our approach allows one to write a dedicated 



7 More precisely, handwritten computation code and protocol code generated by our Reo compiler communicate only 
through shared ports; these ports do not change when replacing one connector with another. Thus, unless the number of 
ports changes, a syntactically valid program remains syntactically valid. 
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implementation of buffers that exploits this HTM. Subsequently, we can replace the standard buffer im- 
plementation with this dedicated implementation^ Thus, besides high-level constructs by default, our 
approach offers developers the flexibility of applying lower-level languages if necessary. 

5 Concluding Remarks 

Our current work focuses on improving our Reo-to-Java compiler. For instance, the classes currently gen- 
erated by our compiler execute sequentially. We can parallelize this rather straightforwardly by checking 
ports for appropriate I/O operations concurrently for different transitions. However, we speculate that we 
can get even better performance if, instead, we optimize at the semantic level: we wish to decompose 
automata into "smaller" automata that can execute concurrently without synchronization while preserv- 
ing the original semantics (see [18] for preliminary results). Another potential optimization involves 
scheduling: the formal semantics of connectors provide very tangible information for scheduling threads. 
Exploiting this information should yield substantial performance gains. Hopefully, such improvements 
make our approach a competitive alternative to lower level approaches also in terms of performance. 

In recent years, session types |[T4l [131 have entered the realm of object-oriented programming (re- 
cent work includes (6l |U HU [I3j HH). Although session types comprise a valuable new technique for 
programming with threads, we wonder if the abstractions provided by them suffice. Still, we consider it 
a very interesting development, especially since Reo does not feature types; extending Reo with session 
types would comprise a significant improvement. 

Finally, although we focused on implementing protocols among threads in this paper, the Reo-to-Java 
compiler presented has proven itself useful also in the domain of Web Service orchestration |fl~9l . 
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